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Abstract 


This  paper  describes  a  metanotation  for  defining  the  syntax 
and  semantics  of  a  programming  language  in  a  rigorously  formal  manner. 
Definitions  are  operational:   A  semantic  definition  is  a  set  of  string 
transformation  rules  that  operate  on  concrete  representations  of  programs 
and  their  environments. 

The  formalism  is  simple  and  easy  to  learn,  and  produces  relatively 
readable  language  descriptions.   To  illustrate  the  formalism,  and  to  facilitate 
comparison  with  other  metalanguages,  a  formal  definition  of  the  simple 
programming  language  ASPLE  is  presented.   The  method  is  compared  in  detail 
with  the  W-grammar  approach,  and  some  techniques  for  verifying  the  consistency 
of  definitions  are  discussed. 
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1.   Introduction 

Although  BNF  and  similar  syntactic  metanotations  have  found  wide 
acceptance,  the  same  cannot  be  said  about  formal  means  of  specifying  the 
semantics  of  programming  languages.   A  surprising  variety  of  semantic 
metanotations  exist,  and  some  of  these  have  been  used  to  define  full-size 
programming  languages,  but  none  have  achieved  widespread  use.   This  is  due 
in  part  to  the  difficulty  of  learning  the  notation,  and  in  part  to  the 
size,  complexity,  and  sheer  unreadability  of  the  definitions  themselves. 

In  this  paper  we  describe  a  syntactic  and  semantic  metanotation, 
LINGOL,  which  has  several  desirable  properties: 

-  It  is  complete.   Any  language  whose  sentences  are  strings 
of  symbols  from  a  finite  alphabet  and  whose  semantics  are 
definable  by  a  Turing  machine  can  be  entirely  defined  in 
LINGOL. 

-  It  is  simple.   A  small  number  of  familiar  mathematical 
objects  -  sets,  tuples,  functions,  and  strings  of  symbols  - 
are  related  by  two  kinds  of  production  rule.   Standard  nota- 
tional  conventions  are  used  wherever  possible. 

-  It  is  readily  adaptable  to  mechanical  verification  and  pro- 
cessing.  Definitions  are  operational,  that  is,  they  pro- 
vide an  algorithm  for  executing  programs  in  the  defined 
language. 

To  illustrate  LINGOL,  we  will  use  it  to  define  a  simple  programming 
language,  ASPLE.   The  original  definition  of  ASPLE  is  due  to  Cleaveland  and 
Uzgalis  [  1  ].   Its  use  here  is  motivated  by  the  fact  that  ASPLE  has  become  a 
kind  of  benchmark  for  evaluating  semantic  formalisms.   In  a  paper  by  Marcotty, 
Ledgard,  and  Bochmann  [  3  ],  ASPLE  is  defined  using  four  very  different  methods 
W-grammars,  Production  Systems,  the  Vienna  Definition  Language,  and  Attribute 
Grammars. 

To  facilitate  comparison,  we  have  followed  the  style  of  [3]  in  our 
definition  where  possible.   Since  LINGOL  is  most  nearly  related  to  W-grammars, 
the  W-grammar  definition  in  particular  has  been  used  as  a  model. 


A  longer  example  of  LINGOL  is  given  in  [2],  where  a  larger  and 
more  realistic  programming  language  is  defined.   The  language  is  interactive, 
hlock-structured,  and  self-extensihle,  and  it  contains  a  full  complement  of 
data  and  control  structures.   For  a  survey  of  other  formal  definition  methods 
and  an  extensive  bibliography  the  reader  is  referred  to  [2]  and  [3]. 

The  remainder  of  this  paper  is  organized  as  follows:   Sections  2 
and  3  contain  informal  descriptions  of  LINGOL  and  ASPLE  respectively.   Section 
4  contains  a  context-free  grammar  for  ASPLE  and  its  formal  semantic  definition 
in  LINGOL.   In  section  5,  we  compare  the  LINGOL  and  W-grammar  approaches  to 
semantics  by  means  of  a  detailed  example,  and  in  section  6  we  discuss  methods 
for  verifying  the  consistency  of  formal  definitions. 


2.   Informal  Description  of  LINGOL 

The  LINGOL  formalism  consists  of  two  metalanguages,  L  and  L~. 
L  is  a  syntactic  metalanguage  that  resembles  an  extended  version  of 
BNF;  ^y    i-s  a  semantic  metalanguage  whose  'programs'  resemble  Markov 
algorithms  or  SNOBOL  string  transformation  rules.   L9  is  used  to  define 
functions  whose  domain  and  range  are  strings  or  n-tuples  of  strings  belonging 
to  syntactic  classes  defined  in  L  . 

We  will  illustrate  LINGOL  by  defining  the  syntax  and  semantics 
of  a  very  simple  language  consisting  of  arithmetic  expressions.   Its  grammar 
is  written  in  L  as  follows: 

Exp  -»■  Int  Partexp 

x:   Partexp  ■*■   (Op  Int)* 

°P  +  ±  I  Z 
Int  ->  Digit+ 

Digit  -*■  0|l|  . ..  |9 

Informally,  an  expression  is  an  integer  followed  by  a  partial  expres- 
sion.  A  partial  expression  is  an  operator-integer  pair  repeated  zero  or  more 
times.   An  operator  is  either  the  symbol  +  or  the  symbol  =,  and  an  integer 
is  a  sequence  of  one  or  more  digits. 

Depending  on  the  media  used  for  presentation,  strings  of  terminal 
symbols  may  be  indicated  by  using  italics,  underlining  them,  or  enclosing 
them  in  quotes.   When  quotes  are  used,  the  quote  terminal  symbol  is  represented 
by  a  pair  of  quotes. 

Nonterminal  symbols  are  the  names  of  syntactic  classes.   Adopting 
a  standard  mathematical  convention,  we  capitalize  class  names  and  use  the 
same  name  in  lower  case  (possibly  with  an  integer  suffix)  to  denote  a  member 


of  the  class;  thus  expressions  from  the  class  Exp  will  be  denoted  by  exp, 
expl,  exp2,  and  so  on.   In  addition,  the  grammar  above  indicates  that  x 
will  denote  a  terminal  string  in  the  class  Partexp. 

The  semantics  of  our  simple  language  can  be  defined  in  various 
ways.   One  possibility  is  to  define  a  function  F  that  maps  every  expression 
exp  into  its  value.   Using  L~,  we  write 

F:   int  ■*  int 

int  +  intl    ■*  Sum(int,  intl) 
int  =  intl    ■+  Compare  (int,  intl) 
int  x  op  intl  -*■  F(F(int  x)op  intl) 

F(exp)  is  evaluated  by  scanning  the  left-hand  sides  of  the  produc- 
tions above  starting  with  the  topmost  production.   When  a  match  with  the  string 
exp  is  found,  the  expression  on  the  right  is  evaluated.   Sum  and  Compare  are 
functions  that  take  strings  of  digits  as  arguments  and  return  a  string  of 
digits  as  a  result.   The  function  Compare  returns  1_  if  its  arguments  denote 
the  same  integer,  and  JD  otherwise;  for  example: 

Compare  (008,8)  =  1 

Note  that  the  last  production  will  only  be  applied  when  exp  contains 
two  or  more  operators,  as  in  the  example  below. 

F (4+4=8)  =  F(F(4+4)  =8) 
=  F(8=8) 
=  1 

An  alternative  is  to  define  the  semantics  of  a  language  by  an  inter- 
preter that  executes  programs  in  that  language.   The  interpreter  is  defined 
by  a  function  I  that  maps  the  current  state  s.  of  the  interpreter  into  its 

successor  state  s.in.   Given  an  initial  state  s„,  I  determines  the  computation 
l+l  0 

s_,s, -  ...,  s  ;  when  I(s  )  is  undefined,  the  computation  is  said  to  terminate 
0  1        n         n 


and  s   is  the  terminal  state  of  the  computation.   For  our  example  language, 
n 

states  correspond  to  expressions  and  terminal  states  to  integers.   The 
function  I  is  defined  by 

I:  int  +  intl  x  ■+■  Sum(int,  intl)  x 

int  =  intl  x  ■>  Compare  (int,  intl)  x 

The  initial  state  s  =  4+4=8  determines  the  following  computation: 


I(sQ)   =   s   -   Sum  (4, 4)  =8  =  8=8 
T(-Sl^   =   S2   =   Compare  (8, J3_)   =   1 


The  definition  of  the  functions  F  and  I  illustrate  the  L  component  of  LINGOL. 
The  syntax  and  semantics  of  the  L_  metalanguage  are  described  below: 

A  description  in  L„  is  a  function-name  followed  by  a  sequence  of 
semantic  productions  of  the  form  p+e,  where  p  is  a  pattern  or  an  n-tuple 
(p  ,p  ,  ...,  p  )  of  patterns  and  e  is  a  string  expression  or  an  n-tuple  of 
string  expressions.   A  pattern  is  a  sequence  of  string  variables  and  strings 
of  terminal  symbols;  a  string  expression  is  a  sequence  of  string  variables, 
terminal  strings,  and  string-valued  functions  with  string  expressions  as 
arguments . 

To  find  the  value  F(x)  of  a  function  defined  by  a  set  of  L~  produc- 
tions, we  match  x  with  the  left  side  of  one  of  the  productions  and  evaluate 
the  right  side.   The  values  of  string  variables  in  the  expression  are  determined 
by  the  pattern  match.   If  no  match  can  be  found  or  if  the  string  expression 
is  undefined,  F(x)  is  undefined. 

The  semantics  and  context-sensitive  syntax  of  L  are  specified  further 
by  the  five  rules  below.   Rules  (3)  through  (5)  assure  that  a  given  set  of 
productions  determines  at  most  one  value  F(x). 


1.  Every  variable  must  be  associated  with  a  syntactic 
class. 

2.  If  the  same  variable  appears  more  than  once  on  the 
left  side,  it  must  match  the  same  string  of  symbols 
each  time  it  occurs. 

3.  Every  variable  that  appears  on  the  right  side  must 
appear  on  the  left  side. 

4.  If  more  than  one  rule  matches  a  given  string,  the 
first  rule  in  sequence  is  chosen. 

5.  If  a  pattern  matches  a  string  x  in  more  than  one 
way,  the  parse  that  assigns  the  longest  substring 
of  x  to  the  first  (leftmost)  pattern  variable  is 
chosen.   If  there  are  several  such  parses,  the  ones 
that  assign  the  longest  string  to  the  second  variable 
are  selected,  and  so  on  until  a  unique  binding  of  pat- 
tern variables  to  strings  is  obtained. 

Patterns  and  tuples  of  patterns  have  the  same  semantics;  in  par- 
ticular, rules  (4)  and  (5)  apply  to  an  entire  tuple  and  not  to  individual 
patterns  within  the  tuple.   For  example,  suppose  we  wish  to  evaluate 
Compare (001 2, 01 2)  where  the  function  Compare  is  defined  by 

Compare:   (zeros  int,  zeros2  int)  -*■     1 
(      int,  int2)        ->  0 
and  the  syntactic  class  Zeros  is  defined  by 

Zeros  ->■  0* 
Both  productions  match,  but  by  rule  (4)  the  first  one  is  selected,  variable 
zeros  is  bound  to  00  by  rule  (5),  int  matches  both  occurrences  of  1_2  by 
rule  (2),  and  zeros2  is  consequently  bound  to  0.      Evaluating  the  right 
side,  we  have  Compare (0012, 012)  =  1. 


3.   Informal  Description  of  ASPLE 

An  ASPLE  program  consists  of  a  sequence  of  declarations  followed 
by  a  sequence  of  executable  statements.   Declarations  serve  to  associate  a 
'mode'  with  each  identifier  used  in  the  program.   There  are  five  types  of 
statement:   assignment  statements,  if-then-else  conditionals,  while-do 
loops,  input  and  output  statements.   Statements  contain  expressions  composed 
of  Boolean  and  integer  constants,  identifiers,  and  the  operators  +,  *,  =,  and 
^  .   The  operators  +  and  *  placed  between  integer  values  represent  addition 
and  multiplication  respectively;  between  Boolean  values  they  represent  the 
logical  'or'  and  'and'  operations.   The  operators  =  and  £   take  integer 
arguments  and  return  a  Boolean  value.   Every  identifier  used  in  an  expres- 
sion must  appear  in  exactly  one  declaration. 

The  example  ASPLE  program  below  is  taken  from  [3]. 

begin 

int  X,  Y,  Z; 

input  X; 

Y  :=  1; 

Z  :=  1; 

if  (X  t   0)  then 

while  (Z  t   X)  do 
Z  :=  Z  +  1; 
Y  :=  Y  *  Z 
end 

fi; 

output  Y 
end 

The  program  above  reads  a  positive  integer  value  X  from  an  input 

file,  computes  its  factorial,  and  prints  the  result  Y  on  an  output  file. 

Variables  X,  Y,  and  Z  are  declared  to  reference  only  integer  values;  their 
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mode  is  thus  reference-to-integer. 

Just  as  integers  can  be  assigned  to  variables  of  mode  reference- 
to-integer,  references  to  integers  can  be  stored  in  variables  of  mode 
reference- to-ref erence-to-integer ,  and  so  on  for  as  many  levels  of  indirec- 
tion as  are  desired.   Consider  the  program  below: 

begin 

ref  ref  int  A,  B; 

ref  int  C,  D; 

int  E; 

E  :=  50; 

C  :=  E; 

A  :=  C; 

D  :=  A; 

input  D; 

output  D 
end 

In  this  program  the  integer  50  is  assigned  to  variable  E,  a  refer- 
ence to  E  is  assigned  to  variable  C,  and  a  reference  to  C  is  assigned  to 
variable  A.   Since  D  expects  a  value  of  mode  reference-to-integer,  A  is 
'dereferenced'  twice  and  the  resulting  reference  to  E  is  stored  in  variable  D. 
The  input  statement  reads  an  integer  value  into  the  variable  E,  and  the  out- 
put statement  prints  the  value  of  E.   Assuming  that  the  value  input  was  25, 
the  final  state  of  memory  is  as  shown  in  Figure  3.1.   B  is  undefined. 
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Figure  3.1.   ASPLE  Memory  Structure 

The  Boolean  or  integer  constant  obtained  by  repeatedly  de- 
referencing a  variable  is  called  the  'primitive  value'  of  that  variable, 
and  its  mode  is  called  the  'primitive  mode'  of  the  variable.   In  the 
example  above,  the  primitive  mode  of  variables  A,  C,  D,  and  E  is  integer, 
and  their  primitive  value  is  25  at  program  termination. 

An  assignment  statement  is  legal  if 

(a)  the  right  side  is  defined, 

(b)  both  sides  have  the  same  primitive  mode,  and 

(c)  if  n  and  n  are  the  number  of  occurrences  of  'reference-to' 

L       K 

in  the  modes  of  the  left  and  right  sides  respectively,  then 
nL  "  X  -  V 
A  legal  assignment  statement  is  executed  as  follows: 

(1)  If  the  right  side  is  a  constant  or  identifier 
and  n  -1  =  n  ,  then  the  value  on  the  right  is 

L  R 

assigned  to  the  variable  on  the  left. 

(2)  If  the  right  side  is  an  identifier  and  n  -1  <  n  , 

L       R 

the  identifier  is  dereferenced  until  a  value  is 
obtained  whose  mode  contains  n  -1  occurrences  of 

Jj 

'reference-to,'  and  this  value  is  assigned. 

(3)  If  the  right  side  is  an  expression  other  than  a 

constant  or  identifier,  n  =  0.   Identifiers  in 

R 

the  expression  are  replaced  by  their  primitive 

values,  the  expression  thus  formed  is  evaluated, 

and  the  resulting  constant  is  assigned. 
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In  the  example  program  below,  the  first  three  statements  il- 
lustrate rules  (1),  (2),  and  (3)  respectively;  the  last  three  statements 
are  illegal  since  they  violate  conditions  (a),  (b) ,  and  (c)  respectively. 


begin 

ref    int   C. 

D 

I 

int   E,   F, 

G; 

bool  H; 

E    :=   10; 

i-i 

= 

1. 

"r-°1 

F    :=   E; 

I"L 

= 

1, 

«,-« 

G    :=    (E); 

I"L 

S 

1, 

n,-0] 

C    :=   D; 

["L 

= 

2, 

nR=2] 

H    :=   E; 

l\ 

= 

1, 

n,-l] 

C    :=    (E) 

I\ 

= 

2, 

nR  =  0] 

end 

The  argument  of  an  input  statement  must  be  an  identifier  whose 
primitive  mode  is  the  same  as  the  mode  of  the  next  value  to  be  input.   The 
identifier  is  dereferenced  to  obtain  a  reference  to  a  constant,  and  the 

input  value  is  assigned  to  the  referenced  location. 

Expressions  appearing  in  other  statements  are  always  evaluated 
or  dereferenced  to  a  constant  as  the  first  step  in  executing  the  statement, 
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4.   Formal  Definition  of  ASPLE 

A  formal  grammar  for  ASPLE  programs  is  given  in  Figure  4.1. 
It  is  an  almost  direct  translation  into  L..  of  the  BNF  grammar  on  page 
195  of  [3],  except  that  two-word  nonterminals  have  been  renamed,  productions 
[B18]  and  [B19]  are  slightly  changed,  and  some  compression  of  the  grammar 
has  been  achieved  by  using  the  Kleene  *  and  +  operators. 

Since  we  intend  to  define  the  semantics  of  ASPLE  by  means  of  an 
interpreter,  we  need  to  extend  the  ASPLE  grammar  to  include  a  definition 
of  the  class  of  computational  states.   The  definition  consists  of  the  pro- 
ductions [B23]  through  [B35]  in  Figure  4.2.   [B36]  through  [B43]  are 
stand-alone  productions  that  define  syntactic  classes  used  by  the  interpreter 
but  not  by  other  syntax  rules.   In  particular,  [B38]  through  [B43]  define 
implementation-dependent  restrictions  on  the  length  of  programs,  memory, 
integer  constants,  identifiers,  and  files.  , 

For  convenience  in  defining  classes  of  fixed-length  strings,  we 
let  N*k  denote  a  sequence  of  k  instances  of  the  syntactic  class  N.   Thus 
Digit*10  is  the  class  of  all  10-digit  integers,  and  Digit*10  Digit+  is  the 
class  of  all  integers  with  more  than  10  digits. 
Interpreter  States 

A  state  of  the  ASPLE  interpreter  is  represented  by  a  program  or  a 
sequence  of  declarations  and  statements,  followed  by  a  snapshot  that  describes 
the  current  contents  of  memory  and  of  the  input  and  output  files  associated 
with  every  program.   Memory  also  serves  as  a  symbol  table:   Each  entry 
includes  the  mode  of  an  identifier  as  well  as  its  contents.   An  identifier 
may  be  undefined,  or  may  contain  (refer  to)  a  Boolean  constant,  an  integer 
constant,  or  a  reference  to  another  identifier  as  in  the  example  below: 
memory ;  A  refbool  undefined;  B  refint  12;  C  refrefint  B; 
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[B01]   Program  "*"  begin  Decls  j_  Stmts  end 

[Declaration] 

[B02]    Decls  ->   (Declaration  j_)*  Declaration 

[B03]    Stmts  -*  (Statement  j_)*  Statement 

[B04]      Declaration  -»   Mode  Idlist 

[B05]        Mode  ->  bool   int   ref  Mode 

[B06]        Idlist  ->  (Id  j)*   Id 

[Statements] 

[B07]      Statement  ■*  Assignment  |  Conditional   Loop  |  Transput 

[B08]        Assignment  ■*■  Id  _^  Exp 

[B09]        Conditional  ■>  if  Exp  then  Stmts  fi 

if  Exp  then  Stmts  else  Stmts  fi 
[BIO]        Loop  ->  while  Exp  do  Stmts  end 
[Bll]        Transput   ->  input  Id 

output  Exp 
[Expressions ] 

[B12]  Exp  ->  Factor  |  Exp  +  Factor 

tB13]  Factor  -*■  Primary  |  Factor  _*  Primary 

[B14]  Primary  ■*  Id   Constant   j(  Exp  )_       _£  Compare  ± 

[B15]  Compare  ■>  Exp  =  Exp  |  Exp  £   Exp 

[Constants  and  Identifiers] 

[B16]  Constant   ■+  Bool     Int 

[B17]  Bool  ->  true   false 

[B18]  Int  ■+  Digits  Digit 

[B19]  Digits  ->  Digit* 

[B20]  Digit  +  0  |  1  |  ...  |  9 

[B21]  Id  ->  Letter+ 

[B22]  Letter  ■+  A  |  B  |  ...  |  Z 

Figure  4.1.   Syntax  of  ASPLE  Programs 
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[States] 

[B23]  State  ■*   Initial   Declaring   Executing  |  Final 

[B24]  Initial  ■»  Program  Snap 

[B25]  Declaring  ■*  Decls  jj_  Stmts  j_  Snap 

[B26]  Executing  -*■  Stmts  j_  Snap 

[B27]  Final      ■*  Snap   Lexemes  error  Lexemes 

[B28]  Snap  -*■     memory ;  Loc*  infile  Record*  outfile  Record* 

[B29]  Record  ■*■     Constant  j_ 

[B30]  Loc     ■+  Id  Mode  Box  j_ 

[B31]  Box  ->  Val    undefined 


[B32] 

Val 

+     Id   Constant 

[B33]x 

,y,x: Lexemes 

-»- 

(Box   Operator   Keyword 

Mode 

_)* 

[B34] 

Operator 

■*    ;   :=  1  +  1  *  = 

If  li 

I  I 

[B35] 

Keyword 

■f     if   then   else   fi 

while 

do 

end   input   output 

memory 

infile   outfile 

[B36] 

Zero  ■*  0* 

[B37] 

Con  ■*■   Constant 

undefined 

[Limitations] 

[B38] 

Longprogram 

-*■ 

Lexeme*10000  Lexeme+ 

[B39] 

Longmemory 

-> 

Loc*2000  Loc+ 

[B40] 

Longint 

-> 

Digit*10  Digit+ 

[B41] 

Longid 

-> 

Letter*6  Letter+ 

[B42] 

Maxint 

-> 

4095 

[B43] 

Longf ile 

-> 

Record*500  Record+ 

Figure  4.2.   Syntax  of  ASPLE  States 
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In  the  W-grammar  for  ASPLE  the  same  state  of  memory  would  be 
represented  by 

memory  loc  A  has  ref  bool  refers  undefined  end 
loc  B  has  ref  int  refers  12  end 
loc  C  has  ref  ref  int  refers  B  end 

We  have  chosen  an  abbreviated  representation  of  memory  in  the 
belief  that  a  generally  useful  formal  definition  should  be  tied  closely  to 
concrete  programs  and  to  concrete  representations  of  memory  of  the  sort  that 
might  be  generated  by  a  symbolic  dump  routine.   Such  a  representation  ought 
to  be  both  compact  and  syntactically  similar  to  the  programs  it  accompanies. 

A  compact  state  description  permits  example  computations  that 
are  not  excessively  bulky .   For  example,  the  execution  of  the  ASPLE  program 

begin  int  X;  X  j^  J3  end 
is  represented  by  the  following  sequence  of  states: 
[SI]   begin  int  X;  X  jj^  0^  end  memory ;  inf ile  outf ile 
[S2]        int  X;  X  j_f_  0j_    memory;  inf  ile  outf  ile 

[S3]  X  2z.  0_L    memory;  X  ref  int  undefined;  inf  ile  outfile 

[S4]  memory;  X  ref int  0;  inf ile  outfile 

The  strings  [SI],  [S2],  [S3],  and  [S4]  belong  respectively  to  the 
subsets  Initial,  Declaring,  Executing  and  Final  of  the  set  of  states. 
Interpreter  Definition 

The  interpreter  for  ASPLE  is  defined  by  a  state  transition  function 
I  and  seven  auxilliary  functions  E,  Plus,  Times,  Equal,  Unequal,  Sue  and 
Pred.   The  last  two  are  the  successor  and  predecessor  functions  for  the 
class  of  non-negative  integers.   E  is  an  expression  evaluator,  and  the  other 
functions  define  the  ASPLE  operators  +»  jS  =,  and  £.      The  domains  and 
ranges  of  the  functions  are  as  follows: 
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I:   State  -  Final  ■>  State 

E:   (Exp,  State)  -*■  Con 

Plus:   (Con,  Con)  ■>  Con 

Times:   (Con,  Con)  ->  Con 

Equal:   (Con,  Con)  ■>  Con  (4.1) 

Unequal:   (Con,  Con)  ->  Con 

Sue:   Int  ■+  Int 

Pred:   Int  -  Zero  ->  Int 

Con  is  defined  by  [B37]  as  the  class  of  integer  and  Boolean  con- 
stants together  with  undefined;  when  arithmetic  overflow  occurs,  or  a  binary 
operator  is  supplied  the  wrong  arguments,  or  an  undefined  identifier  is 
used  in  an  expression,  the  result  undefined  is  passed  through  the  expres- 
sion evaluation  process  and  ultimately  returned  by  E. 

The  definition  of  I  consists  of  the  semantic  productions  [101] 
through  [129]  displayed  in  Figures  4.3  and  4.4.   The  definition  of  function  E 
is  shown  in  Figure  4.5,  Figure  4.6  contains  the  definition  of  Plus,  Times, 
Equal,  and  Unequal,  and  Figure  4.7  contains  the  definitions  of  Sue  and  Pred. 
We  will  consider  each  of  these  definitions  in  turn. 

In  the  definition  of  I,  productions  [101]  through  [105]  serve  to 
enforce  implementation-dependent  limitations  on  ASPLE  programs.   [101]  through 
[104]  cause  a  transition  to  an  error  state  when  a  'compile  time'  error  is 
detected:  excessive  program  length,  too  many  declarations,  an  oversize  constant 
or  identifier.   [101],  [103],  and  [104]  apply  only  to  the  initial  state,  but 
[102]  may  be  invoked  at  any  time  while  declarations  are  being  processed.   [105] 
cause  a  transition  to  an  error  state  when  the  output  file  overflows  during 
execution. 
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I:   [Interpreter] 

[Limitations] 

[101]   longprog  snap  -»■  error  PROGRAM  TOO  LONG 

[102]   x  longmemory  y  +  error  EXCESSIVE  MEMORY  REQUIRED 

[103]   begin  x  longint  y  -*■     error  OVERSIZE  INTEGER 

[104]   begin  x  longid  y  ■+  error  IDENTIFIER  TOO  LONG 

[105]   x  outfile  longfile  ■*  error  OUTPUT  FILE  OVERFLOW 

[Declarations] 

[106]   begin  decls  ^   stmts  end  x  ■>  decls  j_  stmts  j_  x 

[107]   mode  id  _j_  idlist  j_  x  -*•  mode  id  £_   mode  idlist  _^  x 

[108]   mode  id  j^  x  j_  id  mode2  y 

■*  x  j_  id  mode2  y  error  id  ALREADY  DECLARED 
[109]   mode  id  _^  x  memory;  y 

-*■     x  memory;  id  ref  mode  undefined;  y 
[Assignment] 


[110]   id  :=  int 


id  refint  box  y 
id  refint  int  y 
id  refbool  box  y 
id  refbool  bool  y 

id  ref  mode  box  y  ±_  id2  mode  val  z 
id  ref  mode  id 2  y  j_  id 2  mode  val  z 
id2  mode  val  y  j_  id  ref  mode  box  z 
id2  mode  val  y  j_  id  ref  mode  id 2  z 
id 2  mode  val  y 
■*■  id  jj^  val  J_  x  j_  id2  mode  val  y 

[115]   id  2Z   box  J.  x  ->  x  error  ILLEGAL  ASSIGNMENT  id  l^_   box 

[116]   id  jj2  exp  j_  x  ->  id  _£z_  E(exp,  x)  _|_  x 


[111]  id  2Z  bool  j_  x 
->  x 

[112]  id  j^  id2  ^  x 
■+  x 

[113]  id  _^  id2  j_  x 
■>  x 

[114]   id  :=  id2   ;  x 


Figure  4.3.   ASPLE  Interpreter,  Part  I 
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[Conditions] 

[117]   if  true   then  stmts  fi;  x  ■+  stmts  ;  x 

[118]   if  false  then  stmts  fi;  x  -*  x 

[119]   if  true   then  stmts  else  stmts2  fi;  x  ->  stmts   ;  x 

[120]   if  false  then  stmts  else  stmts2  fi;  x  ->-  stmts2  ;  x 

[121]   if  con  then  x  ->  x  error  ILLEGAL  CONDITIONAL 

[122]   if  exp  then  x  -»•  if  E(exp,  x)  then  x 

[Loops] 

[123]   while  exp  jdo  stmts  end;  x  ■*  if  exp  then  stmts  j_ 

while  exp  clo  stmts  end  f i;  x 
[Transput] 

[124]   input  id  ^   x  j_  id  mode  id2  j^  y 

■*•   input  id2  j_  x  \_   id  mode  id2  j_  y 
[125]   input  id  j_  x  inf ile  constant  j^  y 

-*■   id  _£»  constant  j_  x  inf  ile  constant  j^  y 
[126]   input  id  x  x  ■*■     x  error  ATTEMPT  TO  READ  EMPTY  FILE 
[127]   output  constant  j_  x  ■*■     x  constant  j_ 
[128]   output  undefined  ;  x  ■>  x  error  OUTPUT  UNDEFINED 
[129]   output  exp       j_  x  ■>  output  E(exp,  x)  j^  x 

Figure  4.4.   ASPLE  Interpreter,  Part  II 


E:  [Expression  Evaluation] 

[El]  (exp  +  factor,  x)  ■>  Plus(E(exp,  x) ,  E(factor,  x)) 

[E2]  (factor  *_   primary,  x)   ■>  Times (E(factor,  x) ,  E(primary,  x)) 

[E3]  (id,  x  j_  id  mode  val  j_  y)   ■*  E(val,  x  j_  y) 

[E4]  (id,  x)   ->  undefined 

[E5]  (constant,  x)   ->  constant 

[E6]  (  _(  exp  )_   ,    x)      ->  E(exp,  x) 

[E7]  (  _(  exp  =  exp2  )_  ,  x)  ■*  Equal  (E (exp,  x)  ,  E(exp2,  x) ) 

[E8]  (  _(  exp  ±   exp2  2  »  x)  ->  Unequal (E (exp,  x)  ,  E(exp2,  x) ) 

Figure  4.5.   Expression  Evaluation 
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Plus:   [Addition  and  Boolean  'or'] 

[PI]  (   int,  zero  )  ■>  int 

[P2]  (maxint,  int   )  ■>  undefined 

[P3]  (   int,  int2  )  ■>  Plus  (Suc(int) ,  Pred(int2)) 

[PA]  (  false,  false)  ■*■  false 

[P5]  (   bool,  bool2)  •>  true 

[P6]  (    con,  con2  )  ■>  undefined 

Times:  [Multiplication  and  Boolean  'and'] 

[Tl]  (  int,  zero  )  ■*  0 

[T2]  (  int,  digit)  ■>  Plus  (Times  (int,  Pred  (digit) ) ,  int) 

[T3]  (  int,  digits  digit)   ->  Plus (Times (int  0,  digits),  Times(int,  digit)) 

[T4]  (true,  true  )  ->  true 

[T5]  (bool,  bool2)  +  false 

[T6]  (  con,  con2  )  •>  undefined 

Equal:   [Compare  Integers  for  Equality] 
[EQ1]  (zero  int,  zero2  int)   ->  true 
[EQ2]  (int,  int2)  ■>  false 
[EQ3]  (con,  con2)   ■»■  undefined 

Unequal:   [Compare  Integers  for  Inequality] 
[Ul]   (zero  int,  zero2  int)   ->  false 
[U2]   (int,  int2)   ->  true 
[U3]   (con,  con2)   ->  undefined 

Figure  A. 6.   Operators 
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Sue:   [Successor  Function] 
[SOI]   digits  0  ■*-  digits  1 
[S02]    digits  1  ■>  digits  2 


[S09]  digits  8^  ■*  digits  9 

[S10]         9  ■*     10 

[Sll]       int  9  ■*   Suc(int)  0 

Pred:  [Predecessor  Functions] 

[PR01]  digits  JL  ■+  digits  0^ 

[PR02]  digits  2     +     digits  1 


[PR09]   digits  9  ■*  digits  8 

[PR10]        10  +     9_ 

[PR11]  int   0     +     Pred (int)    9 


Figure   4.7.      Unary   Functions 
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The  remaining  productions  are  organized  to  reflect  the  structure 
of  the  grammar.   [106]  initializes  the  execution  process  by  reducing  the 
program  to  a  sequence  of  declarations  and  statements.   Other  productions 
operate  in  one  of  two  ways:   They  remove  a  declaration  or  statement  from  the 
left  of  the  sequence,  execute  it,  and  modify  the  snapshot  accordingly,  or 
they  replace  a  declaration  or  statement  with  equivalent  ASPLE  code  to  which 
another  production  applies.   Both  modes  of  operation  are  illustrated  by  the 
sample  computation  below.   The  productions  used  are  listed  in  their  order  of 
application. 

begin  int  X,X  end  memory ;  inf ile  outfile  [106] 

int  X,X;     memory;  inf ile  outfile  [107] 

int  X;  int  X;     memory ;  inf ile  outfile  [109] 

int  X;    memory;  X  ref int  undefined ; inf ile  outfile    [108] 

memory;  X  ref int  undefined  . . .  error  . . . 

The  order  of  productions  in  a  definition  may  be  significant.   For 
example,  the  domain  of  [108]  is  a  subset  of  the  domain  of  [109];  if  these  two 
productions  were  interchanged  the  second  one  would  never  be  applied  and  redun- 
dant declarations  would  not  be  detected. 

Productions  [110]  through  [129]  define  the  semantics  of  statement 
execution.   Since  the  definition  of  the  assignment  statement  is  the  most  com- 
plex, we  will  discuss  it  at  some  length;  the  remaining  definitions  will  be  lefl 
to  the  reader. 

The  productions  that  define  assignment  are  arranged  in  three  groups 
that  correspond  to  the  three  cases  of  the  informal  description  of  assignment 
in  section  3: 
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(1)  If  the  right  side  of  the  assignment  is  a  constant 

or  identifier  and  n  -l=n  then  the  value  on  the 

L     K. 

right  is  stored  in  the  variable  on  the  left. 

a)  Integer  constants  are  stored  by  [110]. 

b)  Boolean  constants  are  stored  by  [111]. 

c)  Identifiers  that  are  defined  are  stored 
by  [112]  or  [113]- 

(2)  If  the  right  side  is  an  identifier  that  has  been 
defined  but  does  not  satisfy  (1),  [114]  is  applied 
to  replace  the  right  side  with  its  value.   If  the 
resulting  assignment  statement  still  fails  to 
satisfy  (1),  [114]  will  dereference  the  right  side 
again,  and  this  will  continue  until  n  -l=nR  or  the 
right  side  is  a  constant. 

(3)  When  the  right  side  is  an  expression  other  than  a 
constant  or  identifier  or  undefined,  [116]  causes 
the  expression  to  be  replaced  by  its  value  (or 
undefined) .   Single  item  expressions  that  fail  to 
satisfy  (1)  or  (2)  are  intercepted  by  [115],  which 
generates  an  appropriate  error  message. 

The  process  of  executing  an  assignment  statement  is  represented  by 
the  state  transition  graph  in  Figure  4.8.   Each  state  in  the  diagram  corresponds 
to  the  set  of  interpreter  states  matched  by  one  of  the  productions  [110]  through 
[116].   For  example,  if  we  are  in  state  16,  production  [116]  will  be  applied 
and  the  resulting  interpreter  state  will  belong  to  state  10,  11,  or  15  of  the 
diagram.   Either  [110],  [111],  or  [115]  will  be  applied  subsequently. 

From  the  graph  it  is  easy  to  verify  that  an  assignment  statement 
will  eventually  be  processed.   The  only  circular  path  passes  through  state  14, 
and  every  time  [114]  is  applied  n  decreases.   When  1^=0  another  state  is 
reached  and  processing  is  complete. 

We  will  complete  our  discussion  of  assignment  by  providing  a  de- 
tailed example  of  the  operation  of  [112].   The  first  step  is  shown  below:   a 
string  that  represents  the  current  state  of  the  interpreter  has  been  matched 
vith  the  left  side  of  [112].   The  binding  of  pattern  variables  to  substrings 
is  indicated  by  vertical  alignment. 
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Figure  4.8.   Transition  Diagram 
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id  j_=  id2  j_  x      j_  id  ref  mode   box  y  ±   id2  mode   val  z 
A  jj=  B   j_  memory  ;  A  ref  ref  int  G     j_  .B   ref  int  9_       J_  •  •  • 
The  second  step  is  to  evaluate  the  right  side  of  [112]  using  the 
bindings  above.   The  result  is  given  on  the  second  line: 
x      j_  id  ref  mode   id 2  y  \_   id2  mode   val  z 
memory  j_  A  ref  ref  int  jJ     j_  B       ref  int  9^   \_   . . . 
If  A  and  B  had  appeared  in  memory  in  the  reverse  order,  [113]  would 
be  used  instead.   Note  that  B  must  be  defined  for  assignment  to  take  place; 
if  B  were  undefined  the  match  would  fail  since  the  pattern  variable  val  cannot 
take  undefined  as  a  value.   Note  also  that  the  variable  y  is  bound  to  the  empty 
string  because  A  and  B   occupy  adjacent  locations  in  memory. 

Some  care  must  be  taken  in  writing  semantic  productions.   For  example, 
if  the  second  and  third  instance  of  _;  were  omitted  from  the  pattern,  the  fol- 
lowing situation  could  occur: 

id  v=_   id 2  j_  x        id  ref  mode  box  y  id2  mode 
A  \=_   B   j_  memory ;  P  A  ref  ref  int  G  j|M  B_       ref  int  .  .  . 
In  this  example,  the  statement  A  :=  B  assigns  B  to  the  variable  PA 
if  MB  is  defined,  regardless  of  the  mode  or  status  of  A  and  B. 
Auxilliary  Functions 

The  auxilliary  functions  defined  in  Figures  4.5,  4.6,  and  4.7  require 
fewer  productions  than  I  but  make  use  of  recursion.   To  prove  that  the  recur- 
sion terminates  is  not  difficult:  We  simply  note  that  E  is  applied  to  fewer 
symbols  at  each  successive  call,  and  that  the  second  argument  of  Plus  and 
Times  is  decremented  at  each  call  until  it  reaches  zero  and  a  value  is  returned. 
Note  that  production  [E3]  is  applied  repeatedly  to  obtain  the  primitive  value 
of  an  identifier;  since  the  semantics  of  ASPLE  rule  out  circular  chains  of 
pointers,  the  declaration  of  id  need  not  be  passed  on  to  the  next  call  of  E. 
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5.   Evaluation 

A  number  of  criteria  for  evaluating  formal  definition  techniques 
are  proposed  in  [3].  In  particular,  the  authors  point  out  that  an  impor- 
tant measure  of  a  formal  definition  technique  is  its  ability  to  provide  the 
answer  to  detailed  questions  about  the  language  it  describes.   A  sample 
question  is  posed  and  each  of  four  definitions  is  used  to  answer  it.   For 
purposes  of  comparison,  we  will  show  how  the  same  question  is  answered  by 
the  definition  in  section  4.   The  remainder  of  this  section  is  a  detailed 
comparison  of  the  LINGOL  and  W-grammar  approaches  to  language  definition. 

A  question  that  might  be  posed  about  ASPLE  is:   In  the  example 

program  below,  is  the  assignment  of  an  integer  constant  to  the  variable  X 

valid? 

begin 

ref  int  X; 

X  :=  2 
end 

To  answer  the  question,  we  execute  the  program  starting  with  the 
initial  state 

begin  ref int  X;  X  ]f  2  end  memory;  . . . 

We  ignore  the  input  and  output  files  since  they  are  not  used. 
Applying  the  interpreter  productions  [106]  and  [107]  we  obtain  successively 
the  states 

refint  X;  X  ;=  2;  memory ;  . . . 

X_  }=_   2j_  memory ;  X  refref int  undefined;  .  .  . 

Now  we  examine  the  productions  for  assignment.   [HO]  does  not 
apply,  since  it  requires  a  mode  of  refint;  the  next  rule  that  admits  an  intege 
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on  the  right  of  the  assignment  is  [115],  and  applying  it  we  obtain  the  error 
state  below.   The  assignment  is  clearly  invalid. 

memory;  x  refref int  undefined;  . .  .  error  ILLEGAL  ASSIGNMENT  X:  =  2 
Comparison  with  W-grammars 

As  with  LINGOL,  the  W-grammar  method  is  based  upon  strings  of 
symbols  and  rewrite  rules,  and  this  similarity  suggests  that  a  comparison 
between  the  two  will  be  especially  meaningful.   An  obvious  comparison  can 
be  made  by  counting  rewrite  rules;  if  we  do  so  we  find  that  the  SIBYL 
definition  requires  43  syntax  productions,  and  a  total  of  77  semantic  pro- 
ductions, while  the  W-grammar  definition  in  [3]  requires  38  context-free 
productions  (metaproductions)  and  100  additional  productions  (hyperrules) , 
not  including  the  22  productions  of  a  standard  context-free  syntax  for 
ASPLE. 

This  comparison  is  overly  simplistic  for  several  reasons.   First, 
the  semantic  productions  differ  greatly  in  complexity;  one  fairly  elaborate 
hyperrule  can  be  equivalent  to  several  simpler  LINGOL  productions,  and  both 
descriptions  contain  sequences  of  trivial  productions.   Second,  there  are 
differences  of  style  as  well  as  notation.   The  authors  of  the  W-grammar 
definition  have  attempted  to  separate  the  context-sensitive  and  semantic 
aspects  of  ASPLE;  in  the  LINGOL  definition  they  are  intertwined. 

A  more  fundamental  difference  is  that  the  LINGOL  definition  is 
operational  while  the  W-grammar  definition  is  essentially  axiomatic:   In 
effect,  a  computation  must  be  deduced  from  a  set  of  relations  rather  than 
generated  by  an  algorithm. 

We  will  compare  the  two  methods  by  applying  them  both  to  a  simple 
class  of  expressions.   First,  however,  we  must  lay  the  notational  groundwork 
for  a  description  of  W-grammars  and  their  semantics. 
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We  begin  by  using  L  to  define  another  syntactic  metalanguage  W. 
The  nonterminal  and  terminal  symbols  of  grammars  in  W  are  defined  as  follows: 
Any  non-empty  sequence  of  lower-case  letters  followed  by  a  comma  is  a 
nonterminal;  the  symbols  0,  T,  F,  +,  (  and  )  are  terminal  symbols. 
W  +     Production+ 

Production  ■*  N  +  Form 
x,y,z:   Form  ->   (N  |  T)*  (5.1) 

n:        N   ■>  (a    |  b  |  ...  |  z)+  ^ 

T     -*o|llll±llll 

A  sentence  of  W  is  shown  below.   Since  we  choose  to  regard  it  as 
a  grammar  rather  than  a  character  string,  it  is  not  underlined  and  spaces 
are  inserted  for  readability.   It  defines  a  language  containing  two  kinds 
of  expressions,  integer  and  Boolean:   the  type  of  an  expression  is  the 
same  as  the  types  of  its  operands. 

exp,  -*■     intexp, 

exp,   -*■  boolexp, 

intexp,  -*■      (  intexp,   +  intexp,   ) 

boolexp,  ■*■   (  boolexp,  +  boolexp,  )  (5.2) 

intexp,  ■*■     0 

boolexp,  ■+  T 

boolexp,  ■*■     F 

Two  integer  expressions  and  a  Boolean  expression  are  shown 
below: 

0     (0+0)     (T+(F+T)) 
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Now  we  introduce  a  new  metanotation  consisting  of  sequences  of 
productions  of  the  form  p  °*  e,  where  p  and  e  have  the  same  syntax  as  L~ 
patterns  and  expressions.   A  sequence  of  these  productions  defines  a  binary 
relation  on  strings  rather  than  a  function,  since  a  string  may  have  any 
number  of  successors.   Only  the  first  two  of  the  defining  rules  for  L„ 
must  be  satisfied: 

(1)  Every  variable  in  p  or  e  must  be  associated  with  a 
syntactic  class. 

(2)  A  variable  must  match  the  same  string  of  symbols 
each  time  it  occurs. 

We  can  use  these  productions  to  model  the  semantics  of  various 

classes  of  grammars,  including  W-grammars  and  grammars  in  L..  .   For  example, 

the  meaning  of  grammar   (5.2)  is  defined  by  the  relation  Rl  below. 

Rl :   exp,   *♦  intexp, 

exp,   °*  boolexp, 

intexp,   =*  _(  intexp,   +  intexp,  )_ 

boolexp,   =*   (  boolexp,  +  boolexp,  )  (5.3) 

intexp,   =*  0 

boolexp,  =*  T_ 

boolexp^  =*  F_ 

The  relation  Rl  determines  a  larger  relation  Dl  (derives)  defined 

by 

x  n  z   Dl   x  y  z    iff    n  Rl   y 

where  n  is  a  nonterminal  from  the  class  N  defined  in  (5.1)  and  x,  y,  and  z 

are  members  of  Form.   The  class  of  expressions  defined  by  (5. 3)  is  the  set 

of  terminal  strings  derivable  from  exp,  that  is,  the  set  of  strings  y  such 

that  y  e  T*  and 
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exp,  Dl  x,  Dl  x„  Dl  ...  Dl  x  Dl  y  for  some  x.  G  T*. 
— —     13  n  i 

For  example, 
exp,  Dl  intexp,  Dl  (intexp, +intexp, )  Dl  (0+intexp,)  Dl  (0+0) 

Because  our  new  metanotation  admits  string-valued  variables  as 
well  as  string  literals,  we  can  give  a  somewhat  more  compact  definition 
of  the  relation  Rl,  as  follows: 

Intbool  **  int  |  bool 
Rl :   exp,   **  intbool  exp, 

intbool  exp,   "*  j(  intbool  exp,  +  intbool  exp,  )        (5.4) 
intexp,       =*  () 
boolexp,      ■*  T    |  F 
This  is  an  example  of  a  two-level  grammar  or  W-grammar.   The  first- 
level  grammar  defines  a  set  Intbool  of  modes,  and  the  second-level  grammar 
uses  the  variable  intbool  to  avoid  writing  a  production  for  each  mode  of  ex- 
pression.  Since  the  set  Intbool  could  have  been  defined  to  contain  an 
infinite  number  of  modes,  we  see  that  a  two-level  grammar  can  be  used  to 
represent  an  infinite  number  of  context-free  productions.   The  last  produc- 
tion uses  our  standard  abbreviation  for  two  productions  having  the  same  left 
side. 

The  same  two-level  grammar  expressed  in  a  more  standard  notation 
is  shown  below.   We  have  followed  the  lead  of  [3]  in  using  '+'  instead  of 
'plus  symbol'  to  denote  the  terminal  symbol  +,  and  similar  abbreviations 
for  the  other  terminals. 
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INTBOOL   : :   int;  bool. 

exp:   INTBOOL  exp. 

INTBOOL  exp:   _£♦  INTBOOL  exp,  +  ,  INTBOOL  exp,  )_. 

int  exp:    (). 

bool  exp:   T;  F. 

We  can  use  a  two-level  grammar  to  define  the  semantics  of  expres- 
sions as  well  as  their  syntax,  but  to  do  so  we  must  adopt  a  different 
strategy.   The  first-level  grammar  will  be  used  to  generate  an  infinite 
set  of  nonterminals  that  includes  as  a  proper  subset  an  encoding  of  all 
legal  expressions.   For  example,  the  expression  (0+0)  is  encoded  as  the 
non-terminal  int  left  zero  plus  zero  right , .   As  before,  we  define  a  rela- 
tion between  nonterminals  and  forms  (R2),  and  extend  it  to  a  derivation 
relation  (D2);  but  this  time  the  set  of  terminal  strings  derivable  from 
the  nonterminal  exp,  is  the  set  of  expressions  with  their  values.   In  the 
example  derivation  below,  the  initial  choice  of  nonterminals  permits  the 
derivation  of  the  terminal  string  0^  0^  ;  hence  JO  is  an  expression  and  0   is 
its  value. 

exp,  D2  intzero,  intzero,  eval  intzero  giving  zero,  D2  ...  D2  0^  0 

The  first-level  grammar  and  the  second-level  rules  (called  hyper- 
rules)  that  generate  the  set  of  legal  expressions  are  given  below. 

Intbool  -*■   int   bool 

Exp  ->  left  Exp  plus  Exp  right  J  Value 
Value  •*■     zero   true   false 
R2:   exp,   =*  intbool  exp_^  intbool  value^  eval  exp  giving  value^ 

intbool  left  expl  plus  exp2  right,  (5.5) 

=*■  X  intbool  expl  _^  +  intbool  exp2  _^_  )_ 
int  zero,    =*•  () 
bool  true,   =*  T 
bool  false,  =*     J_ 
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The  semantics  of  expression  evaluation  is  defined  by  the 
additional  hyperrules  given  below.   These  rules  ensure  that  a  nonterminal 
of  the  form 

eval  exp  giving  value  A 
will  derive  a  terminal  string  (the  empty  string)  only  when  intbool  eXp 
derives  an  expression  and  intbool  value  _j_  derives  the  value  of  that 
expression. 

eval  value  giving  value  j_    ** 

eval  left  expl  plus  exp2  right  giving  value  x 
°*  eval  expl  giving  value2  _^ 
eval  exp2  giving  value3  _j_ 

where  value  equals  value2  plus  value3  _j_  (5.6) 

where  zero  equals  zero  plus  zero  _j_  =* 
where  true  equals  true  plus  value  ±_    =*■ 
where  true  equals  value  plus  true  _g_  =* 
where  false  equals  false  plus  false  A  =* 

Notice  that  most  of  the  hyperrules  above  derive  the  empty 
string.   To  illustrate  their  use,  we  sketch  the  derivation  of  the  expres- 
sion (0+0)  with  value  j): 

exp,   D2   int  left  zero  plus  zero  right,  int  zero , 

eval  left  zero  plus  zero  right  giving  zero, 
D2  ...  D2   (0+0)  0^  eval  zero  giving  zero,  eval  zero  giving  zero, 

where  zero  equals  zero  plus  zero, 
D2  ...  D2   (0+0)  0 

The  first  two  hyperuules  (5.6)  are  equivalent  to  the  two  axioms 
below.   The  statement  EVAL (value)  =  value  is  true  because  the  nonterminal  ev; 
value  giving  value  generates  the  empty  string.   The  left  and  right  sides 
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of  the  second  axiom  are  logically  equivalent  because  the  left  side  of 

the  corresponding  hyperrule  generates  the  empty  string  only  when  the  right 

side  does. 

Eval (value)   =  value 

Eval(lef t  expl  plus  exp2  right)   =  value 

iff  Eval (expl)   =  value2  and  (5.7) 

Eval(exp2)   =  value3  and 
Plus(value2,  value3)   =  value 
The  axioms  (5.7)  provide  a  recursive  definition  of  the  string-valued 
function  Eval.   The  same  definition  written  in  L  would  look  like  this: 
Eval:   value  ->  value 

left  expl  plus  exp2  right  ■+  Plus  (Eval  (expl) ,  Eval(exp2)) 
A  syntactic  and  semantic  definition  equivalent  to  the  W-grammar 
definition  in  (5.5)  and  (5.6)  is  given  below  using  L..  and  L?.   In  this 
case  Eval  operates  on  concrete  rather  than  abstract  or  encoded  expressions, 
so  its  definition  is  somewhat  shorter. 

Exp  ->  _(  Exp  +  Exp  )_    I  Value 
Value  ->  0   Bool 
Bool  ->  T   F 
Eval:   value  ->  value 

(  expl  +  exp2  )  +     Plus (Eval (expl) ,  Eval(Exp2)) 
Plus:   (0,  0)   ■+  0 

(T,  bool)  ■>  T 
(bool,  T)  ->  T 
(F,  F)     +  F 
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The  class  of  legal  expressions  is  defined  to  be  the  subset  of 
Exp  whose  members  are  mapped  to  values  by  the  function  Eval.   To  enable 
the  function  Plus  to  discriminate  between  legal  and  illegal  expressions,  a 
production  for  the  class  Bool  of  Booleans  has  been  included  in  the  first- 
level  grammar. 
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6.   Verification 

Because  formal  language  definitions  tend  to  be  large  and 
complex,  and  because  they  presently  must  be  checked  by  hand  rather  than  by  a 
compiler  or  interpreter,  typographic  and  logical  errors  have  a  way  of 
creeping  in  and  remaining  undetected.   Clearly,  it  is  important  to 
identify  those  aspects  of  a  definition  that  can  be  checked  in  a  routine 
manner,  and  to  develop  mechanical  means  of  verification  wherever  possible. 
For  LINGOL,  several  forms  of  verification  are  possible. 

An  obvious  first  step  is  to  verify  that  a  definition  is  well- 
formed.  It  must  satisfy  the  context-free  syntax  of  L  and  L  and  the 
context-sensitive  restrictions  given  in  rules  (1)  and  (3)  of  section  2: 
Every  variable  in  an  L_  production  must  be  defined  by  an  L  production, 
and  every  variable  on  the  right  of  an  L-  production  must  also  appear  on 
the  left. 

As  a  second  step,  we  can  attempt  to  verify  that  functions  have  the 
expected  domain  and  range;  see  (4.1).  In  checking  a  function  we  make  use  of 
the  properties  of  other  functions.   For  example,  the  assertion  that  the 
range  of  E  is  the  set  Con  rests  on  the  assertion  that  Con  is  the  range  of 
the  functions  Plus,  Times,  Equal,  and  Unequal. 

The  domain  of  E  is  actually  a  superset  of  (Exp, State),  namely 
the  set  (Exp, Lexemes ) .   To  verify  this,  we  note  that  the  domains  of  the 
productions  [El]  through  [E8]  correspond  to  the  leaf  nodes  of  a  tree  generated 
from  (Exp, Lexemes)  by  applying  the  syntax  productions  [B12]  through  [B15] 
(see  figure  6.1).   Since  the  grammar  for  expressions  is  unambiguous,  the 
leaves  of  the  tree  form  a  partition  of  (Exp, Lexemes) .   Production  [E3]  is 
omitted  since  its  domain  is  a  subset  of  the  domain  of  [E4]. 
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(Exp,  Lexemes) 

(Exp  +  Factor,  Lexemes)  [El] 

(Factor,  Lexemes) 

*•  (Factor  ^_   Primary,  Lexemes)  [E2] 

-^(Primary,  Lexemes) 

I 

r  >(Id,  Lexemes)  [EA] 

I--*- (Constant,  Lexemes)  [E5] 

|--«»(_(_Exp2_)_,  Lexemes)  tE6l 

' — *-(  (Compare)  ,  Lexemes) 

-*-  (XExp  =  Exp^,  Lexemes)       [E7] 

-»►  (J_Exp  £   ExpK  Lexemes)       [E8] 

Figure  6.1.   Tree  of  Alternative  Derivations 


We  can  compute  the  domains  of  a  set  of  productions  by  taking 
their  left  sides  and  replacing  each  variable  with  the  name  of  the  syntactic 
class  it  denotes.   If  we  express  the  result  as  a  Venn  diagram  like  the 
ones  shown  in  figure  6.2,  it  is  easy  to  determine  which  productions  can 
be:   interchanged  without  affecting  the  definition  (those  with  disjoint 
domains);  removed  without  affecting  the  definition  (those  whose  domains  are 
contained  in  the  domain  of  an  earlier  production);  removed  without 
changing  the  domain  of  the  function  (those  whose  domains  are  contained  in 
the  domain  of  a  later  production). 
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El)    (E2J   fm)     E4'J  (E5)    (E6)   (E7)    (E8 


Figure  6.2.   Domains  of  Productions 

Some  of  the  information  in  figure  6.2  can  be  mechanically 
generated  (or  verified)  using  a  tree  of  derivations  like  the  one  in  figure 
6.1.   The  fact  that  productions  [112]  and  [113]  have  disjoint  domains 
cannot,  since  it  depends  on  the  context-sensitive  property  of  ASPLE 
that  no  variable  can  be  declared  twice  (and  thus  id  cannot  both  precede 
and  follow  id2  in  memory) . 

Having  computed  the  domains  and  ranges  of  the  defining  produc- 
tions for  the  interpreter,  we  can  construct  a  transition  diagram  resembling 
the  one  in  figure  4.8.   Transition  diagrams  are  a  useful  abstraction  that 
reveal  properties  of  both  the  definition  (for  example,  the  fact  that  while 
and  input  are  defined  in  terms  of  _if  and  assignment)  and  of  the  defined 
language   (for  example,  the  fact  that  assignment  can  never  be  a  non-term- 
inating computation) . 

An  important  property  of  a  definition  is  locality:   It  is  easier 
to  trace  the  execution  of  a  statement  through  the  definition  if  the  produc- 
tions involved  are  closely  grouped,  preferably  on  the  same  page  of  the 
defining  document.   We  can  use  a  transition  diagram  to  identify  rules  that 
should  be  rearranged,  and  a  Venn  diagram  to  determine  if  the  rearrangement 
is  possible. 
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Since  LINGOL  descriptions  are  operational  definitions,  they 
can  be  used  to  guide  the  execution  of  an  example  program  for  the  language 
being  defined.   If  assignments  actually  assign  and  loops  really  loop, 
we  have  some  additional  assurance  that  the  definition  describe  the 
language  we  intended. 

The  process  of  executing  test  programs  and  generating  sample 
computations  can  be  mechanized,  and  in  fact  this  has  been  done  in  a 
limited  way.   A  portion  of  the  definition  of  SIBYL  was  transformed  into 
a  SNOBOL  program  which  was  then  applied  to  some  sample  computations;  as 
a  result  several  errors  were  detected  in  the  original  definition. 

Not  surprisingly,  the  SNOBOL  implementation  was  extremely  inef- 
ficient.  We  can  do  much  better  by  building  an  interpreter  or  compiler 
for  LINGOL  definitions  that  takes  advantage  of  their  structure.   For 
example,  the  search  for  a  matching  production  can  be  greatly  speeded  up 
if,  for  each  production,  we  examine  the  parse  tree(s)  of  the  current 
state  rather  than  the  underlying  character  string.   If  productions  are 
implemented  as  transformations  on  parse  trees,  we  can  minimize  the 
amount  of  parsing  and  string  manipulation  required. 

We  can  also  make  use  of  the  fact  that  productions  can  be  mapped 
onto  a  state  transition  diagram.   We  need  not  scan  all  29  productions  of 
the  ASPLE  Interpreter  (or  the  108  productions  that  define  SIBYL);  instead 
we  can  limit  the  matching  process  at  each  cycle  to  just  those  productions 
reachable  from  the  current  state. 
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ABCD  :  =  15 
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|  Id       |0perj] 

I          \t 

|ABCD| :=   15 
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Descriptors 


J 


Characters 


Binary  Word 


Figure  6.3.   Internal  Representation  of  Strings 


Finally,  efficient  hand-coded  versions  of  standard  functions 
like  integer  addition  can  be  provided  in  a  library.   A  further  step  is 
to  encode  lexemes  of  standard  types  in  a  way  that  facilitates  processing. 
In  figure  6.3  (b),  for  example,  the  string  '15'  of  type  Int  has  been 
encoded  in  binary  form.   If  we  continued  this  process  of  replacing  strings 
and  string-functions  with  storage  structures  and  hand-coded  subroutines, 
our  formal  definition  would  gradually  evolve  into  an  interpretive  implementa- 
tion of  the  language. 
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7.   Summary 

The  use  of  string  transformations  in  semantic  definitions 
appears  to  have  several  advantages:   Definitions  can  be  written  that 
are  reasonably  compact  and  readable,  at  least  by  comparison  with  some 
existing  formal  approaches.   Semantic  productions  can  be  grouped 
to  form  a  highly  modular  description.   The  semantic  metalanguage  is 
simple  and  easy  to  learn. 

In  addition,  the  notation  lends  itself  to  mechanical  verifi- 
cation.  Because  definitions  are  operational  rather  than  axiomatic,  they 
can  be  used  to  drive  an  interpreter  that  generates  example  computations. 

Basing  the  definition  on  concrete  programs  represented 

by  character  strings  rather  than  abstract  programs  represented  by, 
say,  labelled  parse  trees  offers  advantages  as  well  as  disadvantages. 
On  the  one  hand,  computations  can  be  represented  compactly  and  the 
reader  is  spared  the  effort  of  translating  between  concrete  and  abstract 
syntax.   On  the  other  hand,  questions  of  syntax  may  become  entangled 
with  semantics,  and  care  must  be  taken  to  avoid  unintended  results 
in  the  string  transformation  rules.   In  general,  more  of  the  burden 
is  placed  on  the  authors  of  a  definition  and  less  on  the  users.   As- 
suming that  the  latter  outnumber  the  former,  this  seems  like  a  reasonable 
choice. 
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